Improving Mask Learning Based Speech Enhancement System with Restoration Layers and Residual Connection
نویسندگان
چکیده
For single-channel speech enhancement, mask learning based approach through neural network has been shown to outperform the feature mapping approach, and to be effective as a pre-processor for automatic speech recognition. However, its assumption that the mixture and clean reference must have the correspondent scale doesn’t hold in data collected from real world, and thus leads to significant performance degradation on parallel recorded data. In this paper, we first extend the mask learning based speech enhancement by integrating two types of restoration layer to address the scale mismatch problem. We further propose a novel residual learning based speech enhancement model via adding different shortcut connections to a feature mapping network. We show such a structure can benefit from both the mask learning and the feature mapping. We evaluate the proposed speech enhancement models on CHiME 3 data. Without retraining the acoustic model, the best bidirection LSTM with residue connections yields 24.90% relative WER reduction on real data and 34.57% WER on simulated data.
منابع مشابه
Speech Enhancement using Adaptive Data-Based Dictionary Learning
In this paper, a speech enhancement method based on sparse representation of data frames has been presented. Speech enhancement is one of the most applicable areas in different signal processing fields. The objective of a speech enhancement system is improvement of either intelligibility or quality of the speech signals. This process is carried out using the speech signal processing techniques ...
متن کاملAn Iterative Phase Recovery Framework with Phase Mask for Spectral Mapping with an Application to Speech Enhancement
We propose an iterative phase recovery framework to improve spectral mapping with an application to improving the performance of state-of-the-art speech enhancement systems using magnitude-based spectral mapping with deep neural networks (DNNs). We further propose to use an estimated time-frequency mask to reduce sign uncertainty in the overlap-add waveform reconstruction algorithm. In a series...
متن کاملStudent-Teacher Learning for BLSTM Mask-based Speech Enhancement
Spectral mask estimation using bidirectional long short-term memory (BLSTM) neural networks has been widely used in various speech enhancement applications, and it has achieved great success when it is applied to multichannel enhancement techniques with a mask-based beamformer. However, when these masks are used for single channel speech enhancement they severely distort the speech signal and m...
متن کاملA New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملImproving Speech Intelligibility in Binaural Hearing Aids by Estimating a Time-Frequency Mask with a Weighted Least Squares Classifier
An efficient algorithm for speech enhancement in binaural hearing aids is proposed. The algorithm is based on the estimation of a time-frequency mask using supervised machine learning. The standard least-squares linear classifier is reformulated to optimize a metric related to speech/noise separation. The method is energy-efficient in two ways: the computational complexity is limited and the wi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017